Credit Card Churn Prediction

Description

Background & Context

The Thera bank recently saw a steep decline in the number of users of their credit card, credit cards are a good source of income for banks because of different kinds of fees charged by the banks like annual fees, balance transfer fees, and cash advance fees, late payment fees, foreign transaction fees, and others. Some fees are charged to every user irrespective of usage, while others are charged under specified circumstances.

Customers’ leaving credit cards services would lead bank to loss, so the bank wants to analyze the data of customers and identify the customers who will leave their credit card services and reason for same – so that bank could improve upon those areas

You as a Data scientist at Thera bank need to come up with a classification model that will help the bank improve its services so that customers do not renounce their credit cards

You need to identify the best possible model that will give the required performance

Objective

Explore and visualize the dataset. Build a classification model to predict if the customer is going to churn or not Optimize the model using appropriate techniques Generate a set of insights and recommendations that will help the bank

Data Dictionary:

Importing libraries

Loading Data

Data Overview

Let's check the number of unique values in each column

Univariate analysis

Observation on Age

Observation on Credit Limit

Observations on Total Revolving Balance

Observations on Attrition_Flag

Observations on Gender

Observations on Education level

Observations on Marital Status

Observations on Card category

Apply log-normal to skewed numerical columns

Bivariate Analysis

Data Preparation for Modeling

Split data

Model evaluation criterion

We will be using Recall as a metric for our model performance

Oversampling and Undersampling the train data

Oversampling train data using SMOTE

Undersampling train data using Random Undersampler

Hyperparameter Tuning

**Adaboost, Xgboost & GBM models unsampled data using RandomizedSearchCV will be used further out of all 18 models.

First let's create two functions to calculate different metrics and confusion matrix, so that we don't have to use the same code repeatedly for each model.

RandomizedSearchCV

1. Tuning Unsampled Adaboost Classifer

2. Tuning Unsampled Xgboost

3. Tuning Unsampled GBM (Gradient Boosting Classifier)

Comparing all 3 models from RandomisedsearchCV

Pipelines for productionizing the model

Conclusion and Insights